Universal Words and their relationship to Multilinguality, Wordnet and Multiwords

نویسنده

  • Pushpak Bhattacharyya
چکیده

In this article we address issues concerning construction of lexicon in the context of sentential knowledge representation in Universal Networking Language (UNL), an interlingua proposed in 1996 for machine translation. Lexical knowledge in UNL is in the form of Universal Words (UWs) which are concepts expressed by mostly English words disambiguated and stored in the universal words repository. The UW dictionary is universal in the sense that it aims to store all concepts of all languages of all times. After many incarnations of UW dictionaries were built at many places of the world, the UW dictionary that is based on the content and structure of English wordnet seems to have found unanimous acceptance in the UNL community. We point that some of the concerns that challenge the wordnet building activity, challenge the construction of the UW dictionary too. These concerns are multilinguality and multiwords. We discuss possible solutions to these challenges, noting on the way that challenges to UW dictionary construction arise from the ambition of UWs repository to be universal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources

In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wikipedia categories. This mapping leads to a coarse alignment between WordNet and Wikipedia, useful for producing domain-specific and multilingual corpora. Multilinguality is achieved through the cross-language links between Wikipedia categories. Research in word-sense disambiguation has shown tha...

متن کامل

Multiwords and Word Sense Disambiguation

This paper studies the impact of multiword expressions on Word Sense Disambiguation (WSD). Several identification strategies of the multiwords in WordNet2.0 are tested in a real Senseval-3 task: the disambiguation of WordNet glosses. Although we have focused on Word Sense Disambiguation, the same techniques could be applied in more complex tasks, such as Information Retrieval or Question Answer...

متن کامل

sranjans : Semantic Textual Similarity using Maximal Weighted Bipartite Graph Matching

The paper aims to come up with a system that examines the degree of semantic equivalence between two sentences. At the core of the paper is the attempt to grade the similarity of two sentences by finding the maximal weighted bipartite match between the tokens of the two sentences. The tokens include single words, or multiwords in case of Named Entitites, adjectivally and numerically modified wo...

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

A Very Large Dictionary With Paradigmatic, Syntagmatic, And Paronymic

A very large Russian dictionary is described. It contains currently 3.6 million links between its 120,000 entries. The links are syntagmatic (collocations), paradigmatic (WordNet-like), or paronymic (words similar in letters or in morphs). The entries of the dictionary are singleor multiwords belonging to four main POS. The entries represent so-called grammemes rather than lexemes: e.g., nouns ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013